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Iterative Receiver Design With Off-the-Grid Sparse 

Channel Estimation 

Thomas L. Hansen, Peter B. Jprgensen, Mihai-Alin Badiu and Bernard H. Fleury 


Abstract —In this work we design an iterative receiver that 
incorporate sparse channel estimation. State-of-the-art sparse 
channel estimators simplify the estimation problem to be a finite 
basis selection problem by restricting the multipath delays to 
a grid. Our main contribution is a receiver that is released 
from such a restriction; the delays are “off-the-grid”, i.e., they 
are estimated and tracked directly as continuous values. As a 
result our receiver does not suffer from the leakage effect, which 
destroys sparsity when the delays are restricted to a grid. We 
use the unifying framework of combined belief-propagation and 
mean-field. All parameters in the receiver are inherently esti¬ 
mated. The receiver outperforms iterative receivers embedding 
state-of-the-art sparse channel estimators in terms of both mean- 
squared error of the channel estimate and bit error rate. We 
also demonstrate that our receiver design allows for a significant 
reduction in the number of pilot signals, without incurring any 
increase in bit error rate. The receiver also adapts well to 
situations where the sparse channel assumption is violated; in this 
case its bit error rate is comparable to that of an iterative receiver 
that uses minimum mean-squared error channel estimation. 

Index Terms —Iterative receivers, message-passing algorithms, 
sparse channel estimation, off-the-grid sparse Bayesian learning. 


I. Introduction 

One of the major challenges in designing receivers for 
wireless systems is mitigation of multipath effects through 
channel estimation and equalization. To facilitate channel 
estimation, current systems embed pilot symbols into the trans¬ 
mitted signal. An example is orthogonal frequency-division 
multiplexing (OFDM) systems where a number of subcarriers 
are assigned to transmit pilot symbols. The number of pilots 
is chosen to optimize throughput as a trade-off between the 
amount of bandwidth and power allocated to pilot transmission 
and fidelity of the channel estimate. In this work we seek to 
improve upon this trade-off through a unified receiver design 
that incorporates two main ideas; a) joint channel estimation 
and decoding and b) off-the-grid sparse channel estimation. 

A. Joint Channel Estimation and Decoding 

Classical receiver design employs a functional splitting 
of the process in the receiver into independent subtasks, as 
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Fig. 1. Flowchart of classical receiver design. 


illustrated in Fig. [T] Such a structure is suboptimal, since the 
information learned from the received signal in any of the 
subtasks is only utilized in subsequent subtasks. To remedy 
this sub-optimality feedback loops can be introduced between 
the functional blocks in the receiver. This is known as the turbo 
principle |1) due to the resemblance to iterative decoding of 
turbo codes. 

Application of the turbo principle has lead to many iterative 
receiver designs, e.g. Common to these works is that 

each of the subtasks are designed independently. Typically 
each subtask optimizes an objective function that depends 
on a selected inference principle, e.g. maximum likelihood 
(ML), maximum a-posteriori probability (MAP) or minimum 
mean squared error (MMSE). The work |7| introduced receiver 
design from the perspective of inference in a factor graph. 
This allows receivers to be designed such that the subtasks 
work together to optimize a joint objective function, which 
for example could be the MAP estimate of the information 
bits. We refer to such receivers as joint channel estimation 
and decoding (JCED) receivers. Due to tractability and com¬ 
putational constraints, approximate inference methods must 
be employed for JCED receiver design. Examples of such 
approximated designs use expectation propagation in |8], 
belief propagation (BP) with approximated messages in |9], 
combined BP and mean-field (MF) in (10), [IT), relaxed BP 
in 112 and generalized approximate message-passing (GAMP) 
in 13]. 


B. Sparse Channel Estimation 

The (wireless) channel impulse response (CIR) is often 
described in terms of distinct multipath components: 

L 

g(r) = -Tz )’ (!) 

i=i 

where <5(-) gives the Dirac delta function. Here, L is the 
number of multipath components and the Zth channel multipath 
coefficient is denoted as ai £ C with corresponding multipath 
delay 77 £ IR. In propagation scenarios where the number of 
multipath components L is relatively small, the model 0 
is parsimonious and it is advantageous to perform channel 
estimation by estimating the parameters of this model, i.e.. 
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estimating L, ai and 77 for / = 1..... L. We refer to this 
approach as sparse channel estimation 0-0- 

Sparse channel estimation can be posed as a compressed 
sensing reconstruction problem 0 - Most literature on com¬ 
pressed sensing and sparse channel estimation employs a grid- 
based approximation of the model {]]), where the estimates of 
the multipath delays are confined to a discrete set of possible 
values. Commonly, a sample-spaced grid is assumed, i.e., the 
spacing between the grid points is given by the inverse of 
the system bandwidth. The grid-based approximation results 
in a leakage effect (T6) , (T9) , which causes the CIR to 
only be approximately sparse in the grid-based representation 
0, 0. From a compressed sensing point of view the 
effect of the grid-based approximation can be understood as 
a basis mismatch [22j. Many recent works have proposed 
methods for solving the sparse decomposition problem that 
arises in compressed sensing without resorting to a grid-based 
approximation; see for example (23j-(28). To the authors’ best 
knowledge, these “off-the-grid” methods have not previously 
been applied in the context of sparse channel estimation. 

C. Contributions and Prior Art 


factors in the posterior pdf and c) to reduce computational 
complexity we use a point-estimate of the multipath delays. 

We use the combined BP and MF (BP-MF) framework of 
0 as the foundation of our algorithm derivation, but also 
depart from BP-MF in several ways. We show how the BP-MF 
framework can be modified to provide approximate ML esti¬ 
mation of model parameters, how variables can be estimated 
jointly to improve convergence speed and how some variables 
can be point estimated as an approximate MAP estimate, such 
that a tractable algorithm with low computational complexity 
is obtained. In particular, a point-estimate of the multipath 
delays is found via Newton’s method as proposed in (23) . 

Our receiver only uses a few parameters (specifically the 
two parameters of the Bernoulli-Gaussian prior model, sparsity 
level p and multipath coefficient variance p) to describe the 
statistical properties of the wireless channel and these are 
inherently estimated. This is in contrast to for example the 
linear MMSE (LMMSE) channel estimators (which requires a- 
priori specification of the second-order statistics of the channel 
transfer function) 0.0 and the GAMP receiver 0 (which 
relies on the second-order statistics of the CIR at all multipath 
delays on the sample-spaced grid). 


In this work, we build on recent advances within both JCED 
iterative receiver design and off-the-grid sparse channel esti¬ 
mation to derive a receiver that merges these two techniques. 

Several prior works incorporate sparse channel estimation 
in an iterative receiver. In (20). 0 a joint sparse channel es¬ 
timation and detection scheme is proposed. Channel decoding 
is not considered in the joint processing and an EM algorithm 
is used for channel inference. The delay values are restricted 
to the sample-spaced grid. 

The works 0 > 0 consider JCED receiver design for 
OFDM systems via GAMP and relaxed BP. The multipath 
delays are restricted to the sample-spaced grid. In 1121 the 
CIRs in the numerical evaluations fulfill this restriction, thus 
avoiding any leakage effects by introducing an unrealistic 
channel model. In 0 continuous-valued delays are assumed 
and it is shown that the CIR is not sparse on the imposed 
sample-spaced grid. Numerical evidence shows that the mul¬ 
tipath coefficients follow a super-Gaussian density which is 
modelled via a two-component Gaussian mixture. One of the 
component variances is shown to be small and the use of the 
Gaussian mixture together with a sample-spaced grid can be 
seen as a surrogate of the CIR model in (T). When the variance 
of one of the Gaussian mixture components is set to zero, the 
sparse channel model on the grid is recovered. 

With channel estimation based only on the pilots, the off- 
the-grid sparse channel estimation problem is equivalent to 
that of line spectral estimation 0. The work |j24) presents a 
MF algorithm for line spectral estimation and it is shown that 
the Bernoulli-Gaussian prior (29j| is a powerful and tractable 
sparsity-inducing model for line spectral estimation based on 
MF. Our off-the-grid sparse channel estimator is inspired by 
[ 24) and use the Bernoulli-Gaussian prior model, but also dif¬ 
fers from (24) in several aspects; a) at the data subcarriers the 
observations are modulated with the unknown data symbols, 
b) we assume that the multipath component coefficients fully 


D. Notation and Contents 


We denote column vectors as a and matrices as A. Con¬ 
jugate (Hermitian) transpose is denoted as (-) H and non¬ 
conjugate transposition as (-) T . The scalar or [a]* gives 
the ith entry of vector a, while as gives a vector containing 
the entries in a at the indices in the integer set S. The set 
difference operator <S\{z} gives the index set S with index i 
removed; we abuse notation slightly and write S\i for short. 
The notation [A]$ * gives the (i, fc)th element of matrix A. We 
denote the vector a with the /th element removed as au and 
use a similar notation for matrices with columns and/or rows 
removed (e.g. [A]* for the zth row with fcth entry removed). 
The notation diag(a) denotes a matrix with the entries of a 
on the diagonal and zeros elsewhere. The indicator function 
![.] gives 1 when the condition in the brackets is fulfilled and 
0 otherwise. The notation a cx e b denotes exp(a) oc exp(6), 
which implies a = &+const. The probability density functions 
of the (vector) complex normal is defined as 

CN(a:; pi, S) = n~ dim (*)| S |-i exp(-(a; - x - pi)) 


The notation unif (x; 0, T ) gives the continuous uniform prob¬ 
ability density function on the interval [0, T] and Bern(a;; p) 
gives the Bernoulli probability mass function for x £ {0,1} 
with probability of success p. We use * to denote convolution. 

The paper is structured as follows: In Section [II] we specify 
the observation model. In Section III our approach to approxi¬ 
mate Bayesian inference is discussed. The inference algorithm 
is derived in detail in Section IV Section [V] presents the 
numerical evaluation. Conclusions are given in Section [VI] 


II. Modelling 

We consider data transmission using a single-input single¬ 
output OFDM system. Since we do not exploit any structure 
between consecutive OFDM symbols, we model the sequence 





3 


of transmitted OFDM symbols to be independent and identi¬ 
cally distributed (i.i.d.). The OFDM system transmits P pilot 
subcarriers and D data subcarriers, such that the total number 
of subcarriers per symbol is N = P + D. The sets V and V 
give the indices of the pilot and data subcarriers, respectively. 
It follows that PUT = {1,..., N} and Tn? = 0. 


A. OFDM System 


The K (equi-probable) information bits to be transmitted 
are stacked in vector u £ {0,1} A . These bits are coded by a 
rate R coder and interleaved to get the K/ II coded bits c = 
C(u). The interleaving and coding function C : {0,1} A —► 
{0, \] K / R can represent any interleaver and coder, e.g. a turbo 


low-density parity check (LDPC) 1331 or convolutional 
code. We split the coded bits c into subvectors £ {0, l}^, 
i £V, such that c 1 ' 1 contains the Q bits that are mapped to 
the ith subcarrier. The complex symbols Xi = i £ V, 

are obtained via the 2^-ary mapping A4 : {0,1}^ -> A D c C, 
where Ad is the data symbol alphabet. The pilots are selected 
in the pilot symbol alphabet Ap C €. In OFDM, Ad is typically 
a 2 q -ary quadrature amplitude modulation (QAM) alphabet 
and A P a quadrature phase shift keying (QPSK) alphabet. The 
pilot and data symbols are stacked in vector x. Vector x-p 
contains the data symbols and x-p contains the pilot symbols. 

We define the effective CIR as 


ffeff( r ) - crx(t) * g(r) * ctx(t), (2) 

where g{r), crx(t) and ctx(t) are the impulse responses of 
the channel and the anti-aliasing filters in the receiver and 
transmitter, respectively. We have the usual assumption in 
OFDM of time-limited effective CIR: 


(Al): <7eff(r)=0 for t^[ 0;T cp ], 

where Tcp is the cyclic prefix duration. 

By (Al) the OFDM system operates without inter-symbol 
interference, and we can consider a single OFDM symbol. The 
OFDM transmitter emits the baseband signal 

= f En=i x n ex p(j27rAynf) t £ [—Tcp; T sym \, ^ 
1 0 otherwise, 

where A j gives the subcarrier spacing and T sym = Aj 1 is the 
OFDM symbol length. The received signal is 

r(t) = 5eff(r) * s(t) + w(t ), (4) 

where w{t) is a white Gaussian noise process. The receiver 
samples r(i), removes the cyclic prefix and calculates the 
discrete Fourier transform to obtain the observed vector y. 
Assumption (Al) ensures that orthogonality of the subcarriers 
is preserved. It can be shown [34] that 

y = X/r + w, (5) 

where X = diag(at) and u? is a white Gaussian noise vector 
with component variance 3. The vector h contains samples of 
the (continuous) Fourier transform of g e s(r) at the subcarrier 


frequencies, i.e., its entries are 

rT sym 

hn= 5eff(r) exp(—j27rA/nr) dr, n = l,...,N. (6) 

Jo 

From 0, 0 and by the convolution theorem, we get 

h = C«(r)a, (7) 

where C is a diagonal matrix with the diagonal given as 
samples of the Fourier transform of Cxx('f) * crx( t ) at the 
subcarrier frequencies. The matrix \P(r) £ C NxL has (n, Z)th 
entry exp(— j27rA/nrj), n = 1,..., N, l = 1,..., L. For ease 
of notation, we define i/>(t;) as the Zth column of St (r). We 
have stacked the channel multipath coefficients and delays into 
vectors a = [a\,... , ct^] r £ C L and r = [ti,...,T£] t £ 
[0,7cp] a . In the following we make the common assumption 
that C = I, i.e., that the combined response of the anti-aliasing 
filters is flat over the system bandwidth. If this is not the case, 
the effects of the filters can easily be included in the dictionary 
matrix ’P(r), as shown in [351. 

We recognize from 0 that h is a superposition of complex 
sinusoids. Thus, given the data symbols in X, the estimation 
of L, a and r from 0 reduces to line spectral estimation. 

B. Validity of the CIR Model 

The channel model 0 is certainly not valid in all propaga¬ 
tion environments. It assumes that the CIR is composed of a 
small number of dominant components, while the remaining 
energy is assumed to be below the noise floor. This has 
indeed been demonstrated to be the case for the ultra-wideband 
channels that are considered for 5G wireless communications 
(36), (37) and underwater acoustic channels [38). For further 
discussion of the validity of the sparse channel assumption, see 
(HJ, (39) and references therein. As demonstrated in Sec. [V] 
our algorithm, which is developed based on this assumption, 
performs well even if the number of multipath components L 
is large (L NAfl\y) and the individual components in 0 
can no longer be resolved. 

Even if the CIR can be modelled as a small number of 
distinct components (pulses) in the delay domain, it is not 
given that the Dirac delta is a good model for the shape of 
that pulse. The Dirac delta has constant frequency response 
within the system bandwidth (TV Aj) and this is in fact the 
only assumption we have made on the pulse shape (the pulse 
shape must also be such that (Al) is satisfied). The Dirac 
delta pulse in 0 can thus be replaced by any other pulse with 
constant frequency response within the system bandwidth and 
we would arrive at the same signal model. 


C. Probabilistic Model of the OFDM System 

We are now ready to present a probabilistic model which 
describes the complete OFDM system. The model expresses 
the joint probability of all variables in the system as a 
product of factors. This factorization of the joint probability 
is represented as the factor graph depicted in Fig. [2] In the 
following we introduce the variables and factors in the factor 
graph, moving from right to left. 
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■ MF | BP ■ 


Fig. 2. Factor graph representation of the probabilistic model describing the complete OFDM system and channel model. The shaded areas indicate multiple 
copies of the nodes, as specified by the index sets. The vector of observations y is included with a dotted line because it is known at the time of inference. 
Variables which are point-estimated (as opposed to a full variational estimate of the posterior pdf) are represented by circles with dashed lines. 


The interleaving, coding and modulation of the data bits are 
described in Sec. II-A The subgraph characterizing the system 
implementing these tasks, involves the factors 


fu k (u k ) =p{u k ) = 0.5 l[ Ufc 6{ 0 ,i}]) keJC, 
fc{c,u) =p(c\u) = l[ c= C(ii)] i 
f Mi (Xi,C M ) =p{x.i\c {l) ) = l[x i =M ( c('))]) * € v - 

The factor fc{c,u) describes the interleaving and channel 
coding process. By zooming in this factor can be replaced with 
a subgraph involving auxiliary variables and factors which 
describe the structure of the channel code and interleaver. 

The subgraph characterizing the observation process de¬ 
scribed by (|5]i and ([7]i involves the following factors for pilot- 
and data subcarriers, respectively: 


f P j (ot, t , /3) — p(yj\a, r; /3) 

= CN {y j -,x j ['f?(T)a\ j ,p), j G V, 
f Di (xi,a,T,/3) =p(y i \x il a 1 T; (3) 

= CN(y,;i i [$(T)a] i ,/3), i G V. 

When we say that the CIR in ([T} is sparse we understand that 
the number of multipath components L is relatively small. To 
model this property we use a Bernoulli-Gaussian prior which 
assigns large probability to the event ai = 0. We model L 
multipath components (usually L > L) of which only a subset 
is activated, i.e. has cq ^ 0. This allows us to derive an 
algorithm which inherently estimates the number of multipath 
components. In our implementation we select L = N, as 
we cannot reliably estimate more multipath components than 
there are observed subcarriers. Each component is assigned 
an activation variable Z\ G {0,1}, which is 1 when said 
multipath component is active and 0 otherwise. The sequence 
{zi,...,Zl} is modelled i.i.d. where each zi is assigned a 
Bernoulli prior with activation probability p : 


fz t {zi,p ) =p(zf,p) =Bern (zpp), l G £, 


where we have defined the set of multipath component indices 
£ = { 1. The prior density of the multipath coefficient 
ai is conditioned on Zi, such that zi = 0 implies ai = 0 and 
Z\ = 1 gives a Gaussian density with variance p: 

fat (op, zi, rf) = p(ai\zi;rj) 

= (1 - zi)5(ai) + zi CN(cq; 0, tj), l G £. 


When performing inference in this model, the estimated num¬ 
ber of active multipath components is L = |«11 q , where a is 
a vector containing the estimates of ai for all l G £. 

We finally need to impose a prior model on the multipath de¬ 
lays ti, l G £. The only prior information available is through 
the assumption that for all l G £ we have 0 < r/ < Tqp so an 
i.i.d. uniform prior is used: 

fn ( T i ) - P( T i ) = unif(rj; 0, T CP ), l G £. 

III. Inference Method 

The BER optimal receiver computes the MAP estimate 

u k = aigmaxp(u k \y-,p,ri,P), k G K, (8) 

«fe6{0,l} 

where JC = {1,.... K} is the index set of the information 
bits. The pdf p(u k \y; p, rj, P) p(u k ,y; p,r], j3) can ideally 
be found by marginalizing all variables but u k in the joint pdf 

P(y, z, a, t, x v ,c, u; p, p, (3) = p{y\x D ,a , r; /3) 
Y[p(ai\zi;p)p(zi-,p)p('n) F[p(^|c W )p(c|w) ]J p(u k ), 

zGD fcG/C 

Calculating the marginals of u k , k G 1C, is intractable and we 
resort to approximate Bayesian inference. 

A. Combined Belief Propagation and Mean-Field 

Our inference method is based on the merged belief prop¬ 
agation and mean-field (BP-MF) framework of f30) . In this 
framework a so-called belief function is found for each vari¬ 
able in the factor graph. The belief function is an approxi¬ 
mation of the posterior pdf or pmf of that variable. We abuse 
notation and let q(a) denote the belief of variable a. When the 
set of belief functions has been calculated, the MAP estimate 
of the kth data bit is easily found as the mode of q(u k ). For 
tractability we obtain a point-estimate of some variables; (z, t) 
are approximate MAP estimates of (z,t) and (p,f),(3) are 
approximate ML estimates of (p,r),f3). 

At the heart of BP-MF lies the so-called region-based free 
energy approximation (RBFE) |40) . The RBFE is obtained by 
splitting the factor graph into MF and BP subgraphs. In Fig. [2] 
we have indicated the splitting by a dashed line. The RBFEHis 

1 The RBFE is also a functional of a belief corresponding to each factor in 
the BP subgraph. BP-MF enforces consistency between the variable beliefs 
and these factor beliefs. For simplicity we do not mention the factor beliefs. 
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a functional of the point-estimates z,t, p,f), 0 and the belief 
functions q(a{), q{xf), q([c^] m ) and q(uk) for indices l £ 
£,i £ T>, k £ K, and m = 1,..., Q. The expression of the 
RBFE is given in Appendix [A] BP-MF seeks to minimize 
the RBFE under a number of normalization and consistency 
constraints. The messages of BF-MF are derived such that at 
convergence they satisfy the Karush-Kuhn-Tucker conditions 
of the constrained RBFE minimization, i.e., a (possibly local) 
minimum of the constrained problem is found. See f30) for a 
more detailed discussion of BP-MF. 

The understanding of BP-MF as RBFE minimization allows 
us to make a number of adaptations to the message-passing 
scheme to improve convergence speed. Further, we will see 
that this understanding is useful when analyzing convergence 
of the algorithm. 

11. Point-Estimation with BP-MF 


C. Relation to Prior Art 

To relate our receiver algorithm to current methods, we 
note that the decoding of many popular channel codes can be 
described as an instance of BP [431 in a factor graph )44|-]46|. 
For example, BP decoding of a convolutional code leads to the 
BCJR algorithm ]47) . We see in Fig.[2]that the merged BP-MF 
algorithm employs BP in the subgraph which represents the 
channel code, i.e., standard techniques are used for decoding. 

Similarly there are examples in the literature of MF infer¬ 
ence in factor graphs which resemble the MF subgraph of our 
receiver. The work ]24) uses a Bernoulli-Gaussian prior model 
similar to that in our work, while |23|. |25j use a hierarchical 
sparse Bayesian learning model. 

The strength of the BP-MF framework is now clear: It 
allows us to merge existing methods for channel decoding and 
sparse estimation into one unified receiver algorithm, which is 
formally derived as RBFE minimization. 


For tractability we wish to find a point-estimate of 
(z, t, p , r /, 0). In [301 it is proposed to point-estimate variables 
with BP-MF by including them in the variational inference and 
restrict their beliefs to Dirac or Rronecker delta functions. In 
this paper we take a conceptually different approach and let the 
RBFE be a function of the point-estimated variables. The point 
estimates are obtained as the minimizers of the RBFE. The re¬ 
sulting estimators are the same with both approaches. We also 
note that in a pure MF context both approaches correspond 
to variational EM estimation with all other variables treated 
latent variables (3§, f4T). To point-estimate the model 


parameters (p,t],/3) with the approach in [301, one needs to 


impose an improper (flat) prior on these parameters. Such 
improper priors cannot be rigorously treated when forming 
the RBFE. Our approach avoids such technical difficulties. 
As explained next, with our approach, p, r) and 0 can be 
interpreted as approximate MF estimates and zi and 77 can 
be interpreted as approximate MAP estimates. 

All of the point-estimated variables (z,T,p,p,z 3) are only 
present in the MF subgraph. Following an approach similar to 
a lower bound on the log model evidence can be obtained: 


In p{y, p, fj , 0) > — -Fbp-mf + const., 


(9) 


IV. Sparse BP-MF Receiver Algorithm 

To minimize the RBFE, we apply the BP-MF algorithm 
given by Eq. (21)-(22) in [30] within the factor graph of 
Fig. [5] In the following we use the notation (-) a to denote 
expectation with respect to the belief density q(a). We follow 
the convention of j30) in naming the messages. In ED a 
similar BP-MF receiver is derived, which does not exploit 
channel sparsity. 


A. Message Passing for Channel Estimation 

1) Update of Coefficient Belief: We start by finding belief 
updates in the MF subgraph. To find the update of q(ai), we 
calculate the messages passed to the node 07 : 

mf , 7 fexp(-f?" 1 |a / | 2 ) if z t = 1 , 

1 if zi = 0 

oc exp^/T 1 (|)a]i| 2 ) Xi 

« exp^-zT 1 (| y 3 - a: i [’i f (f)a] i | 2 ) Q ^j , 

which holds for all( £ £, i £ V and j £ V. Taking the product 
of all messages going into the node 07 gives its belief: 


where Fbp-mf is the RBFE ( |34| ) and the constant only depends 
on beliefs in the BP subgraph (including q{xi ), for i £ V), 
i.e., it does not depend on z,f,p,fj,0 and < 7 ( 07 ) for l £ C. 
Minimization of the RBFE with respect to (p, fj. 0) thus leads 
to maximization of the lower bound on the log model evidence 
and the estimates are thus approximate MF estimates. 

Similarly, we have the lower bound 

In p { y , zi \ p , fj , 0 ) > -Fbp-mf + const., (10) 

where Fbp-mf is again the RBFE and the constant is 
different from CD but again does not depend on z,f,p,fj,0 
and 17 ( 07 ) for l £ C. By Bayes rules we have proportionality 
between the joint pdf p(y,zpp,fj,0) and the posterior pdf 
p(zi\y] p, f), 0). The update of zi can then be recognized as an 
approximate MAP estimate. A similar argument can be made 
for the update of 77. 


/ , _ | CN(op pz, of) if zi = 1, 
q[ai> ~ \(J(az) if 2 ; = 0 , 

with the active component mean and variance 


p-i = 6-fqi ( 12 ) 

<Tz 2 = (si + p -1 ) \ (13) 

where we have introduced 

s* = 0- V H (*l) <X H X) xi> t/>(T 7 ) (14) 

qi = 0~ 1 ft ii (Ti)r (15) 


' = < X >L y <x Hx )^ ( 16 ) 

Note that the belief of inactive components (27 = 0) becomes 
a point mass at 07 = 0 , thus eliminating the influence of that 
component in the product X\P(r)o:. We have defined the set 
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of currently active components as A = {l : zi = 1 } and the 
vectors /2 = [p, 1} .. .,p L ] T , dr 2 = [df, ■ • • 

2) Joint Update of Delay and Coefficient Belief: We now 
turn our attention to the estimation of the multipath delays 77 . 
To improve the convergence speed of the algorithm, we find 
this update by minimizing the RBFE jointly with respect to 
the beliefs q(ai) and fj. Due to the prior 73 ( 77 ), the following 
expressions are valid for 77 £ [0, Tcp]. We are only concerned 
with active components, i.e., I £ A and thus zj = 1. Writing 
only the terms of the RBFE ( [34] ) which depend on q(ai) and 
n, we get 

F BP . MF (q(ai) 1 fi) cc e [ q{a t ) In — dai, (17) 

J Q\Otl,Tl) 

with 

Q(ai,Ti) =p(ai\zr,r))p(fi)exp( (in p(y\xv, a, t;$)) ) 

\\ / X' D ,OL\ l J 

oc CN(ai;pi,di)expf V (18) 

\si + 77 - 1 ) 


search. We have the following lemma, which we will use in 
the convergence analysis: 

Lemma 1: The procedure listed in step 1-2, above, followed 
by an update of q(ai) leads to a non-increasing RBFE. 

Proof: First, note that the updated 77 , does not decrease 
g n ( 7 j). It then follows that by selecting the maximizing q{afj 
in ( fl9| ), the RBFE is non-increasing. ■ 

3) Joint Update of Activation Variable and Coefficient Be¬ 
lief: We now turn our focus to the update of the activation 
variable zp It is again desirable to perform a joint update of 
Zi and q(cq). We proceed in a similar way as we did for the 
multipath delays. The terms of the RBFE ( |34| > which depend 
on q(ai) and zi are denoted as -F B p-mf( <7 ((*/); zf). We then 
define 


g Zl {zi) — max -F BP . MF (q(ai),Zi) 

da ; =l 

oc e +lnf+ln p if*, = l, 
\ln(l — p) if A = 0, 


(23) 

(24) 


where of, ffi, si and qi are given by ( [ 12 ] ) - ( fl5| and thus 
implicitly are functions of 77 . We need to minimize ( fT7| under 
the normalization constraint f q(ai) dai = 1. To do so, define 


g Tl (fi)= max -F bp _ M f (q(ai), n) ( 19 ) 

g(ai) da; = l 


oc e In J Q{a U T l )dai 

<x e 7rzr l^ H (^)^| 2 - 
si + r) 1 


( 20 ) 


( 21 ) 


The result in ( [20] ) is easily obtained by noting that ( fl7| can 
be rewritten as 


T^bp-mf oc e KL 


q{ai) 


Q(ai,fi) 


f Q(di,fi)ddi 


- In J Q(ai,Ti) da;, 


where KL[-11-] is the Kullback-Leibler divergence. The coef¬ 
ficient belief is selected as the maximizing q(a{) in ( fl9| , i.e., 
q{cn) = Q(cq,ri)/ f Q(ai,ri) doq, which is easily shown to 
coincide with the result in ( fTT) . 

Since si is constant with respect to 77 , we find the delay as 


77 = argmax g Tl ( 77 ) = argmax (fi)r\ 2 . (22) 

ti G [0, Tcp] ti G [0,Tcp] 


We recognize the objective function in ( [22] ) as the peri- 
odogram of the residual vector r. While it is possible to 
find the maximizer of the periodogram, doing so has high 
computational cost. In our iterative algorithm, we instead find 
an update of 77 which cannot increase the objective in ( |22[ . 
Denote the updated delay estimate as fj^ and the old delay 
estimate as ff Our scheme now reads: 


1) Find initial step A = 1 J 


19"Uf 11 ) | 

2) If g T (rf-^ + A) > g T (j\-\ set rf = ff“ 1] + A 

and terminate. Otherwise set A = and repeat step 2. 

Functions g' T (ri) and g"(ji) are the first and second deriva¬ 
tives of the objective in The scheme gives the Newton 
update of 77 if this value increases the objective function and 
otherwise resorts to a gradient ascent with a backtracking line 


This result is easily obtained by following steps analogous to 
|T7]) - £T). The activation variable is given by the decision 
problem Zi = maxg !e r 0jl } g Z[ (zj). Writing the “activation 
criterion” g Zl ( 1) > g Zl ( 0) we get 


l «| 5 


>ln^ 


In 


1-/5 


(25) 


'1 w i P 

If the above criterion is true we set Zi = 1; otherwise we 
set Z( = 0. The corresponding update of < 7 ( 07 ) is given as 
the maximizing q(ai) in ( [23] ), which remains as in ( [TTj ). The 
criterion in ( [25] ) is the same as that obtained in {24). 

4) Update of Channel Parameter Estimates: The channel 
parameters (p, 77, 0) are estimated as the values which min¬ 
imize the RBFE. Writing only the terms of the RBFE ( [34] ) 
which depend on the channel parameters: 


-Fbp-mf(/ 5, ? h/3) 


oc e - In n p{zi\p)p(oq\zi; fj)p(y\a, f, xt>\ 0) 

\ lec 

tx e Iloilo ln/5 + (L - ||£||o)ln(l - p) - N ln/3 - /3 _1 m 

- Ilillolnp-r)' 1 y] (|/i;| 2 + <jf), (26) 

l-Zl — l 


where 


u A (Hy-X’S'CrHI 2 )^ 

= ||y|| 2 + (X h X)^ ^(r A )0 A 

+ ^^V H (r/)(X H X) a!i 3 V’(fz) 

leA 

-2Re{y*(X) x ^(T A )0 A }. (27) 

It is readily seen that Fbp-mf(/5, fp /3) can be minimized inde¬ 
pendently with respect to each of the parameters. By taking 
derivatives and equating to zero we find the global minima 
(the second derivatives are all positive): 


P = 


L 


(28) 
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V = 


£l:*=l(l£j| 2 +*i 2 ) 

Iloilo 


P = 


u 

w 


(29) 

(30) 


5) Iterating all Coefficient Beliefs Ad-Infinitum: We now 
continue and calculate in closed form the beliefs q(a{), l £ A, 
of all (active) coefficients that result from iterating the updates 
of all coefficient beliefs ad-infinitum. Doing so increases the 
convergence speed of the algorithm. 

Since q{af) = S(ai) for all l £ C\A, the following 
discussion is only concerned with the belief of active compo¬ 
nents. First note that the variance ( p~3] > of an active multipath 
coefficient of does not depend on the beliefs of the remaining 
coefficients q(ak), k A l- The mean > [T2| ) of the ith coefficient, 
on the other hand, depends on the remaining mean values as 


w = df 

[QIm p« 

- X] /3~ 1 '0 H (r;)(X H X) ;CD ’0(f fc ) /tfc) , 

fce ^V [Qlz.fc 

for all l £ A. The matrix Q is of size \A\ x \A\ and we 
have abused notation in using l,k as indices into this matrix, 
because 1 < l, k < L, even though \A\ < L. The above 
equation is recognized as the Gauss-Seidel (48) iteration for 
solving the system of linear equations 

Q ba = p ( 31 ) 

with 

p = (X)^y 

Q = <X H X)^ vp(^) + r i L 

It follows that the updates of fii, for all l £ A, converge to 
the solution fix found by solving ( |3T| l. 

We note that in the hypothetical special case where the 
beliefs of X are point estimates (or equivalently known 
without uncertainty) y = X$(f;)a; + w is a linear obser¬ 
vation model with Gaussian noise. In this case, the estimator 
fix = Q~~ 1 p reduces to the LMMSE estimator of a A in the 
linear observation model under the Bayesian model dictated 
by the current beliefs of the remaining variables. The estimator 
fi A = Q 1 p is, however, not the LMMSE estimator of n A 
when the uncertainty of the estimate of X is considered. 


space constraints, we will not discuss this part of the algorithm 
in detail. 

The only messages in the BP subgraph that cannot be 
calculated by direct evaluation of the sum-product algorithm 
are 


=m fD ^ x .{xi) 


oc CN 


Xi\ 



(32) 


where hj = [T'frja:], is the channel frequency response at 
subcarrier i. Its mean and second moment are 


(■ h i) a ,T = [^(^)A]i 

(\hi\ 2 ) aT = [^( f )(/ i / iH + diag(<T 2 )) 1 5' H (f)] ij .. 

Note that even though the above expression has the form 
of a Gaussian, the messages are probability mass functions 
obtained by evaluating the above Gaussian at the points of 
the symbol alphabet Ad followed by appropriate normal¬ 
ization. The messages (xf) constitute the interface 

from the continuous-valued channel estimator to the discrete¬ 
valued decoder. The mean in ( |32) i can be interpreted as an 
LMMSE estimate as follows: Consider the observation model 
yi = ffixi + Wi where p{wi) = CN(iUi; 0, /3) and let q{a{) be 
the true density cq and 77 be the true value of 77 . Impose a 
prior p(xi) = CN(xi; 0, cr 2 .) on Xi. The LMMSE estimator 
of Xi is now 

-LMMSE _ 2A (hi) a T 

~ (N 2 ) +fiaf 2 ' 

\ / OL.T 

By letting cr 2 —> 00 to express that we have no prior 
information on Xi, we recover the mean in ( |32| ). Note that 
a similar analogy does not exits for the variance ( |32) . 

When BP messages have been passed in the BP subgraph, 
the beliefs of the data symbols Xi, i £ V, are calculated from 

q(xi) oc rn^ i _ yx .(x i )m^ f( _ > . x .(xi). (33) 

Since qixf) is a probability mass function, we can use 
straightforward evaluation of finite sums to obtain (X) and 
/X I[ X;^ , which are used in the belief updates in the MF 
subgraph. 


B. Message-Passing for Decoding 

In the previous subsections we derived the belief functions 
q(-) of the variables whose factor neighbours are in the MF 
subgraph only. In the BP subgraph, i.e., detection, demapping, 
decoding and deinterleaving, we instead focus on calculating 
the messages that are passed in the factor graph. All messages 
passed to the BP subgraph, are functions of discrete variables 
(i.e., coded or information bits) and it is therefore tractable 
to calculate these messages directly by the sum-product algo¬ 
rithm. This has been studied thoroughly in the literature for 
various different coding schemes, see e.g. (44), 1451. Due to 


C. An Incremental Algorithm 

Algorithm[l]combines the derived belief update expressions 
into an iterative scheme that performs joint sparse channel 
estimation and decoding. The algorithm is split into two parts: 
channel estimation (lines [5] - [30] ) and decoding (line 32 1 . 


The outer loop alternates between these two steps until the 
information bit estimates have not changed in 10 iterations or 
a maximum of 50 iterations is reached. 

The scheduling of the channel estimation is inspired by 
|23|. The basic idea is to construct a sparse representation 


of the wireless channel in the form of 0 by sequential 
refinement of the multipath components in the constructed 













Algorithm 1: Off-the-grid sparse BP-MF receiver. 

Input: Observations y, pilot indices V and pilot symbols x-p. 
Output: Belief functions of data bits { q(uk)}keK■ 

Notes: Define the set of components as £ = {1,. .., L} and 
the set of active components as A = {l £ C '■ zi = 1}. 
t if- Vector with values from equispaced grid on [— ^'2 , Tcp]. 


Initialize channel parameter estimates (p, p,/3). 
z, t, p, <t 2 «— Zero vectors of length N. 
while Outer stopping criterion not met do 
while Inner stopping criterion not met do 
p^, Updates from ( |3 1 and CD- 

Activate an inactive component: 
if the inactive set C\A is non-empty then 
l <— Any index from the inactive set C\A. 
zi <— 1 . 

t i 4— Value from \22\ calculated on the grid r. 


2 

3 

4 

5 

6 

7 

8 
9 

10 
11 
12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 end 


Va 

Tl 4 


Updates from <[37} and PL 


Update via the scheme in Sec. IV-A2 
2 ' Updates from P} and 


if activation criterion (|25t is false then 
zi <— 0 . 

Reset to the value calculated in line pi 

end 
end 

Update all components currently included in model: 

for l £ A do 

77 «— Update via the scheme in Sec. |IV-A2 
pi , of ■<— Updates fro m (| 1 2^ and ( | 1 3p . 
if activation criterion (|25^ is false then 
| zi<- 0 . 

end 
end 

4 Updates from ( |31| l and PJ- 
p, 77 , -<— Updates from (|28M29|> and t[30j. 
end 

Update the messages m ^ _ ¥x .(xi) from 
Iterate message-passing in the BP subgraph. 

Update the beliefs q(xi) from [33}. 


channel model. One component is determined by the set of 
parameters ( zi,ai,Ti ) for a particular index l. All multipath 
components are initialized in the inactivated state, i.e., z is the 
zero vector. 

The channel estimation procedure alternates between two 
stages: In the activation stage (at line [7j one of the inactive 
components are activated and its multipath delay and coeffi¬ 
cient is calculated. The activation criterion ( [25] ) determines if 
the component should stay activated. In the second stage (start¬ 
ing at line 201 , all active components are sequentially refined. 
Again, the criterion ( [25] ) determines if a component should 
be deactivated. The channel estimation thus iteratively adds, 
updates and possibly removes components until the stopping 
criterion is fulfilled. The multipath delays are tracked via the 
scheme in Sec. IV-A2 in a way that resembles the operation of 
a rake receiver |49|. The approach presented here differs from 
rake receivers by providing an integral criterion for inclusion 
an exclusion of components (rake “fingers”) via [25j ). The 
multipath delay of the newly activated component is found 
via a maximization over the grid t. The grid should have a 
sufficiently fine resolution, such that the initial estimate of the 
delay is close to the global maxima in ([221. We choose the 


distance between points in the grid as (iVAy) _ 1 / 8 . As inner 
stopping criterion we use — l//3[ t- 1 l| < lO -3 //^ 4-1 !, 

where t gives inner iteration number. The number of inner 
iterations is limited to 50. 

During the first outer iteration the decoder has not been 
used yet and symbol beliefs q{xi) are not available for the data 
subcarriers (indices i £ V). During the first iteration the chan¬ 
nel estimator therefore only uses the pilot subcarriers (indices 
j £ V). We initialize the noise variance as /3 = ||y|| 2 /iV and 
the active component prior variance as 77 = 1. The activation 
probability is initialized as p — 0.5. To ensure that a sufficient 
number of components are added to the model, the activation 
probability is kept fixed during the first outer iteration. 

D. Convergence Analysis and Computational Complexity 

We now wish to analyze the convergence properties of 
Algorithm |T| First recognize that the algorithm alternates 
between updates in the MF and BP subgraphs of Fig. [2] To 
analyze convergence, we discuss under which conditions each 
of these sets of updates lead to a non-increasing RBFE. If all 
updates give a non-increasing RBFE it can be concluded that 
the algorithm converges. 

We first discuss the updates in the MF subgraph, i.e., of be¬ 
lief functions q(a{) (l £ C) and point-estimates (z,f 1 p,f),/3). 
During these updates the messages m /^, -^ X ( x i) are kept 
fixed. The joint update of 77 and q(ai) gives a non-increasing 
RBFE as per Lemma [T] A similar conclusion can be drawn 
regarding the joint update of zi and q(ai). The individual 
update of q(ai) is found via the method of Lagrange mul¬ 
tipliers applied to the RBFE with normalization constraint 
f q{ai) do 1 = 1. The second order functional derivative of the 
RBFE is a positive semi-definite function; it 

follows that the RBFE is convex in this argument. It can be 
concluded that the update of q{on) is the global minimizer of 
the RBFE and the objective is thus non-increasing. A similar 
conclusion can be drawn regarding the channel parameters, 
cf. Eq. [26| . All updates in the MF subgraph thus give non¬ 
increasing RBFE. 

We now analyze the convergence in the BP subgraph, i.e., 
the updates of belief functions q(xp), q([c^] q ) and q(uk). 
Considering the belief functions of variables in the MF sub¬ 
graph as fixed and ignoring scaling and constant terms, the 
RBFE is equal to the Bethe free energy corresponding to the 
factorization (see |30] Appendix E]) 

n n p ( uk )■ 

k£K. 

Further, all messages in the BP subgraph are equal to the 
messages obtained from BP applied to the above factorization. 
This means that we can analyze the behaviour of message¬ 
passing in the BP subgraph, by analyzing BP applied to the 
above factorization. If the factor graph does not contain any 
loops it can be shown that BP globally minimizes the Bethe 
free energy (which in this case is equal to the Gibss 

free energy) and convergence of the complete BP-MF receiver 
algorithm is guaranteed. Recall that the factor p(c\u) describes 
the channel code and may be replaced by a number of auxiliary 
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variables and factors. The specific structure of the BP factor 
graph is thus determined by the channel code. In the special 
case of convolutional channel coding with binary or quadrature 
phase-shift keying (BPSK or QPSK) modulation, the BP graph 
does indeed become a tree-graph and convergence of Alg. |T| 
is guaranteed. If the modulation order is higher than QPSK, 
loops occur between /m ; and fc and convergence can thus 
not be guaranteed. 

For other common channel codes such as Turbo and LDPC 
codes the subgraph represented by fc contains loops. How¬ 
ever, BP has empirically been shown to converge for decoding 
of many channel codes and it is a well known practice 
to use BP even though convergence cannot be guaranteed 
theoretically, see e.g. j44j- ED- When BP does converges it 
has been shown to be to a (local) minimum of the Bethe free 
energy [(50) , which further explains why we do indeed obtain 
convergence of Alg. [T] in our numerical investigations. There 
exists conditions under which BP is guaranteed to converge 
in loopy graphs, e.g. j5Tj|, [[52). These are, however, not 
applicable to our situation. 

We now turn our attention to the computational complexity 
of the channel estimator, i.e., the loop starting at line [5] The 
most demanding part of the channel estimation in terms of 
computational complexity is the calculation of via ©• 
We show in Appendix |B| that (under a conjecture) this update 
can be calculated in time O ^min(L 2 iV, LNs/N)^ , where L 
is the number of components currently included in the model. 

The grid search in line[TT]is recognized as the maximization 
of the periodogram on a grid, which can be calculated via a 
fast Fourier transform in time 0(N log A r ) when the grid is 
assumed to be of size O(N). 

The loop starting at line [2T] necessitates the calculation of 
r in ©■ Direct evaluation is a computation of order 0{LN) 
for each of the L iterations in the loop. By updating r with 
each change to fi, the direct evaluation can be avoided and 
the complexity of each iteration in the loop becomes 0{N), 
which is the same as that of all other operations inside the 
loop. The overall complexity of the loop is thus O(LN). 

With these remarks, we see that the overall complexity per 
iteration of the channel estimator is O (mm(L 2 N, LNVlSf)^. 

V. Numerical Evaluation 

In our numerical evaluation we use an OFDM system as 
described in Sec. [HI with parameters listed in Table [I] We use 
a rate-1/2 non-systematic convolutional channel code, decoded 
by the loopy BP implementation from the Coded Modulation 
Library]^ The pilot signals are chosen at random from a QPSK 
alphabet. The pilot subcarriers are located equispaced, i.e., the 
number of data subcarriers between any adjacent pair of pilot 
subcarriers is fixed. The selected number of subcarriers N = 
601 ensures that both the first and last subcarriers are pilots. 

We use a stochastic channel model with an exponentially 
decaying power delay profile. The channel responses (|T) are 
generated as follows: First, the number of multipath compo- 

- Available from http://iterativesolutions.com/Matlab.htm 


Parameter 

Value 

Number of subcarriers ( N ) 

601 

Subcarrier spacing (A/) 

15 kHz 

Cyclic prefix duration (Tq P ) 

5.2 fi s 

Number of pilots (equispaced) 

101 

Pilot spacing (implied by the above) 

6 

Modulation of data subcarriers 

256-QAM 

Convolutional channel code polynomial 

(561,753) 8 

Interleaver 

Random 

Mean number of multipath components 

« 5 

Maximum multipath delay (r max ) 

5.2 /us 

Time constant of exp. decaying channel (v) 

1.5 /LIS 


TABLE I 

Simulation parameters. 


nents L is drawn from a zero-truncated Poisson distributiorf] 
with A = 5 (the mean number of multipath components is then 
5.034). The L multipath delays in r are drawn independently 
and uniformly from (0,r max ) and the L coefficients in a are 
drawn independently from the conditional densities 

p(a.i \ti) = CN(cq; 0, wexp(— tj/u)) , l = l,...,L, 

where u is chosen such that E[|/u| 2 ] = 1 and v = 1.5 fis. 
The noise variance is calculated based on the realization of 
the channel frequency response as /? = ||ft)I 2 /(SNR ■ N ). 

In our numerical investigation, we asses the performance 
of the considered receivers in terms of average coded bit 
error rate (BER) and normalized mean squared error (MSE) 
of the channel frequency response vector, calculated as ||ft — 
ft)| 2 /||ft)| 2 . These averages are obtained from 2500 Monte 
Carlo trials (5-10 6 information bits), with each trial containing 
one OFDM symbol. For SNR > 20 dB we use 15,000 trials 
(3 • 10 7 information bits) to get reliable BER estimates. The 
OFDM symbols and channel realizations are generated i.i.d. 
according to the above description. 

A. Evaluated Algorithms 

We evaluate our algorithm (Off-the-grid BP-MF) and com¬ 
pare with the following reference algorithms: 

Turbo-GAMP / /T]/: The algorithm employs a sample-spaced 
grid in the delay domain, i.e., the resolution of the grid is T s = 
( NAf) _1 « 111ns. We use a grid with L pre = L post = 20 
extra grid points (taps) before and after the cyclic prefix to 
capture sidelobe energy. The channel model that we use here 
does not include clustering effects and we therefore use the 
version of Turbo-GAMP without the hidden Markov model of 
the chain tap clusters. For each channel tap coefficient a large- 
tap and small-tap variance is provided along with a large-tap 
probability (see (T3) for more details). These are estimated 
via the EM algorithm provided in p"3) from 10,000 channel 
realizations. Turbo-GAMP is provided with significant prior 

3 The pmf of the zero-truncated Poisson distribution is p(k ) = , 

k = 1,..where A is the mean of the Poisson distribution before truncation. 

The mean of the zero-truncated Poisson distribution is -—^— 5 -. 

l—e —A 
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Fig. 3. Averaged BER (left) and MSE of channel frequency response estimate (right) vs. SNR. The legend to the right applies to both plots. 




Fig. 4. Averaged BER (left) and MSE of channel frequency response estimate (right) vs. number of pilot subcarriers at 18dB SNR. The legend to the right 
applies to both plots. 


information on the CIR via these statistical values. We also 


provide Turbo-GAMP with the true noise variance, as [131 
does not give a way to estimate this value. 

Freq. -domain BP-MF [I7j, 153]: The algorithm directly 
estimates the channel frequency response h via a BP-MF 
framework. This receiver can be considered as a JCED ex¬ 
tension of a receiver with LMMSE channel estimation. As the 
prior on h we use a zero-mean complex normal distribution 
with the robust covariance matritf] described in [31]. 

Oracle Channel Estimator: This receiver is provided with 
the true value of the noise variance /3 and uses an oracle 
channel estimator, which computes an LMMSE estimate of 
h with the knowledge of the transmitted symbol vector x 
(i.e., both pilots and data are known), the vector of delays t 
and the probability density function of the channel multipath 
coefficients in a, i.e., the probability density function of h is 
known exactly. From the point estimate of h, the messages 
( Xi ) are computed for all * £ V (see ([32])), followed 
by 5 iterations in the BP subgraph of Fig. [2] We note that 
this receiver computes a very accurate channel estimate and 
we have observed that its BER performance is on par with 
a receiver that has knowledge of the true channel frequency 
response vector h (perfect channel state information). 


B. Varying the Signal-to-Noise Ratio 

Fig. [3] shows performance results for varying SNR. We 
first note that Off-the-grid BP-MF perform very well in both 
BER and MSE. Its BER is remarkably close to that of the 
oracle channel estimator, indicating that there is very little 
margin for improvement of the algorithm. The two reference 

4 Since the number of multipath components is relatively small in our 
simulations, the channel frequency response does not follow a Gaussian 
distribution and we have observed that better performance can be achieved 
by scaling the robust prior covariance as S' = 2bS. 


algorithms Turbo-GAMP and Freq.-domain BP-MF perform 
worse in terms of MSE by a factor of approximately 7 dB or 
more. The BER loss of Freq.-domain BP-MF compared to our 
algorithm corresponds to an approximately 0.5 dB increase in 
SNR. 

We conjecture that the error floor observed for Turbo- 
GAMP in high SNR is caused by the restriction of the delays 
to the sample-spaced grid in its design. If the delays are gen¬ 
erated to be located on such a grid, the performance of Turbo- 
GAMP is very close to that of the oracle channel estimator (not 
shown here). We have also investigated a version of Turbo- 
GAMP with the small-tap coefficient variance set to 0 and 
noted that it does not yield better performance than the version 
of Turbo-GAMP shown here. We note that such error floors in 
BER and MSE have previously been observed for other grid- 
based sparse channel estimation algorithms, see for example 

tH- (35). 

C. Varying the Number of Pilots 

In Fig. [4] we vary the number of pilot subcarriers to 
investigate if the increased channel estimation accuracy offered 
by our algorithms allow for decreasing the number of pilots 
without significantly impairing the performance. 

With an equispaced pilot pattern, the pilot spacing should 
obey Ap > 1/(A/Tcp). If this is not the case, the channel 
estimate obtained in the first iteration (where only pilot sub¬ 
carriers are used) is of such low accuracy that the iterative 
processing does not converge to an estimate with low BER. 
From the viewpoint of sparse channel estimation, this can be 
understood as an identifiability issue; even in the noise-free 
case with a single multipath component, the pilot observations 
does not uniquely determine the delay of that component 

5 We define the pilot spacing as Ap = D + 1, where D is the number of 
data subcarriers between adjacent pairs of pilot subcarriers. 
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Fig. 5. Averaged BER (left) and MSE of channel frequency response estimate (right) vs. mean number of multipath components at 18dB SNR. The legend 
to the right applies to both plots. 


on the interval [0,Tcp]. In a compressed sensing context the 
criterion A p > l/(A/Tcp) ensures that the dictionary matrix 
have the restricted isometry property (RIP). For the frequency- 
domain BP-MF receiver (which uses LMMSE channel esti¬ 
mation), the criterion ensures that the coherence bandwidth is 
larger than the frequency separation (ApA/) between pilots. 
We can also understand the phenomenon from the viewpoint 
of Nyquist-Shannon sampling where time and frequency have 
switched role; the channel frequency response is sampled (in 
the frequency domain) with rate l/(AfAp), which fulfills the 
Nyquist-Shannon sampling criterion if the power delay profile 
is supported on the interval [0, Tcp] with Tq p < l/(AyAp). 

When the CIR is sparse the criterion on the pilot spacing 
can be circumvented by employing a non-equispaced pilot 
pattern. This is essentially an application of the principle of 
compressed sensing 09’ allowing fewer frequency-domain 
samples of the channel frequency response than dictated by 
the Nyquist-Shannon sampling theorem. For the Off-the-grid 
BP-MF receiver we use a pilot pattern which is generated 
according to the method in We have observed that 

all reference algorithms achieve the best performance with 
an equispaced pilot pattern, which is therefore used in the 
numerical evaluation. 

In Fig. [4] we clearly see that Off-the-grid BP-MF can operate 
with significantly fewer pilots than the reference schemes. The 
criterion A p < l/(A/Tcp) dictates a pilot spacing less than 
13. The identifiability issue is clearly seen, since the reference 
algorithms work well at 51 pilots (A p = 12), while their BER 
increases significantly at 41 pilots (A p = 15). Off-the-grid 
BP-MF performs well with significantly fewer pilots due to 
the use of a non-equispaced pilot pattern and a sparse channel 
estimator. We reiterate that Turbo-GAMP did not show lower 
BER when using a non-equispaced pilot pattern, because it 
does not fully exploit the sparsity of the CIR. 

D. Varying the Number of Multipath Components 

We now investigate how the algorithms perform when the 
assumption of a small number of multipath components is 
not fulfilled. The performance versus the mean number of 
multipath components is shown in Fig. [5] These mean values 
are obtained by setting the parameter A of the zero-truncated 
Poisson distribution as A = 1,3, 5,13, 32, 80, 200, 500,1200. 

All the algorithms show decreasing BER performance as the 
number of multipath components grows. This is because less 
structure is present for channel estimation and therefore the 


estimation problem becomes harder (the channel degrees of 
freedom increases). For Freq.-domain BP-MF the decrease in 
structure is incarnated as a more frequency-selective channel, 
which deteriorates the estimator performance. For the sparse 
channel estimators the increasing number of multipath com¬ 
ponents means that the CIR cannot be resolved into distinct 
multipath components, thus deteriorating performance. We 
note that the BER performance of Off-the-grid BP-MF is better 
than or on par with Freq.-domain BP-MF no matter how many 
multipath components are present. Specifically, the sparse 
channel estimator performs as well as an LMMSE channel 
estimator when the individual multipath components cannot 
be resolved, while offering lower BER when the multipath 
components can be resolved. 

E. Convergence Speed 

We now proceed to investigate the convergence speed of 
the algorithms, again using the settings given in Table. 0 Fig. 
[6] shows average values of BER and channel reconstruction 
MSE versus the number of outer loop iterations. The metrics 
are recorded immediately after the decoding operation. These 
results are for 18 dB SNR. It is clear that all algorithms 
converge within 15 iterations. Due to the initialization pro¬ 
cedure that we have used, we note that the results at the first 
iteration correspond to those of receivers that only use pilots 
for channel estimation and channel decoding with only a single 
forward-backward passing of messages in the BP subgraph. 
Such a receiver clearly exhibits a very low performance. This 
indicates the significant performance improvement achieved by 
JCED receivers. We also note that both Off-the-grid BP-MF 
and Turbo-GAMP decrease the BER very fast and are close 
to convergence at the second iteration. This is an important 
property for any practical implementation of the receivers, 
as the majority of the benefit of iterative processing can be 
obtained with a small number of iterations. 

F. Importance of Pilot Subcarriers 

We have observed that the pilot subcarriers are most impor¬ 
tant in the first iteration of the receiver, i.e., during the initial 
period of the iterative processing. To illustrate this effect, we 
note that the pilot subcarriers can be ignored at any point 
during the iterative processing, by simply removing the nodes 
fp 3 for all j £ V in the factor graph in Fig. [2] In Fig. [7] we 
show BER and MSE vs. iteration number for two versions of 
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iteration number at 18 dB SNR. The legend to the right applies to both plots. 



Off-the-grid BP-MF: The first version uses pilot information 
in all iterations, while the second ignores all pilot subcarriers 
after the first iteration. To illustrate that pilot subcarriers can 
be ignored after the first iteration, even when there is only 
limited pilot information available, we use a non-equispaced 
pilot pattern as described in Sec. |V-C| and show results for 
both 31 pilots (Ap = 20) and 51 pilots (A p = 12). 

In Fig. [7] it is observed that when only 31 pilots are used, 
convergence is slower than what is seen in Fig. [6] This is 
because significantly fewer pilots are available. 

We also observe that there is negligible difference between 
using pilots in all iterations, to using pilots only in the first 
iteration. This, perhaps surprising, result makes it clear that 
the pilots are only important to get the iterative processing 
started. It demonstrates that the feedback of data bits for use 
in channel estimation is an extremely powerful technique. It 
also suggests that pilot schemes should be designed solely with 
the objective to help initialization of the iterative processing, 
and that further reduction of pilot overhead may be possible 
if the pilot scheme is designed for this particular purpose. 

VI. Conclusions 

In this paper we have derived a joint channel estimation and 
decoding receiver which employ sparse channel estimation. An 
OFDM receiver is derived using the BP-MF framework for 
approximate Bayesian inference. Unlike other sparse channel 
estimators, our scheme does not restrict the channel impulse 
response multipath delays to a grid. As a result, our receiver 
can truly exploit sparsity of the channel impulse response, 
without resorting to approximate sparsity (as in m- s 
ED). We have presented a numerical evaluation that compares 
our algorithm with state-of-the-art methods, i.e., Turbo-GAMP 
[j 1 3 1 and Freq.-Domain BP-MF pT) . 

The numerical evaluations showed several interesting re¬ 
sults. First of all, we have demonstrated that following the 
method of e.g. Turbo-GAMP and restricting the multipath 


delays to a sample-spaced grid is not a viable approach 
because the channel impulse response is only approximately 
sparse on this grid. 


When random pilot patterns are used, our receiver allows 
for a significant reduction in the number of pilot signals, 
without any notable decrease in BER performance. The pilot 
signals are most important during the first iteration of our 
algorithm, to initialize the iterative processing. An interesting 
research direction is the joint design of pilot signalling and 
channel coding schemes, which allows for initialization of an 
iterative receiver while using a minimum amount of redundant 
information. To this end, the idea of using training bits (12), 

1131 instead of pilot signals may be useful. 


The performance of our algorithm degrades stably when 
the assumption of a sparse CIR is not fulfilled. In the case 
where the number of multipath components is very high 
(3> NAfTcp), i.e. the channel impulse response is diffuse, 
our algorithm perform as well as a frequency-domain iterative 
receiver that does not assume channel sparsity. This is a very 
appealing property as, to obtain good performance, receiver 
designers need not be worried whether the sparsity assumption 
is fulfilled or not. 


Appendix A 

The Region-Based Free Energy Approximation 

At the heart of the derivation of our algorithm lies the 
RBFE as defined by [ [30} Eq. (17)], [40j. In this paper we 
use the RBFE of the probability distribution corresponding to 
the factor graph depicted in Fig. [2] For reference, we give here 
the complete expression of the RBFE, 

-Fbp-mf = Fbp + Fmp, (34) 

with 
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systems of linear equations. In the following we show that the 
number of iterations of the CG method is (D[yN). 

We first need a conjecture on the eigenvalues of the 
(Hermitian-Toeplitz) matrix T = /3 _1 rj\l/(T / |)\I' H (T^). 

Conjecture 1: There exists an upper bound on the largest 
eigenvalue of T, which grows linearly with N , i.e.. 


Number of subcarriers (TV) Mean number of multipath 

components (A/(l — e —A )) 

Fig. 8 . Average of the largest eigenvalue of the matrix T encountered 
during one execution of Off-the-grid BP-MF. Average obtained from 1000 
Monte Carlo trials. Both plots were generated using the simulation scenario 
described in Sec. [V] at 15 dB SNR. In the plot to the left, a least-squares linear 
fit is shown with a dashed line. 
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where bc(c, u), bMi (Xi , c^) for i £ V and b Uk (u k ) for k £ K, 
are factor beliefs. With abuse of notation we let q(-) denote 
variable beliefs and the notation (•) denote expectation with 
respect to the belief density q(a). 


Appendix B 

Efficient Calculation of p^ When L is Large 


In this appendix we present a computationally efficient 
method for evaluating p: as defined by We first note 
that direct evaluation and inversion of Q has time complexity 
0(L 2 N), where L = |4|. The method we present in the 
following is an iterative method, which, assuming Conjecture 
[l] holds, has complexity 0(LN\j r N). It is thus beneficial to 
use the presented method when L grows faster than \fN. 

We first use the Woodbury matrix identity to write p as 

p = $- l fj (i - C- 1 *^)) H> h (t^) <X>^ y, 

where 


c = (X H X)~ 

\ /a 




We immediately recognize that the computationally dominat¬ 
ing part is to solve a system of N linear equations of the 
form Cz = a. Since C is Hermitian and positive-definite, we 
can solve this system via the conjugate gradient (CG) method 
(Alg. 2.1 in [55]), which is an iterative method for solving 


A m ax(T) = 0(N). 

To justify this conjecture we refer to Fig. [8] where the largest 
eigenvalue is shown for both varying N and varying number 
of multipath components. A clear linear dependence on N is 
seen, while the largest eigenvalue does not increase (in fact it 
decreases) with the number of multipath components L. 

We also need a number of lemmas. 

Lemma 2: There exists constants Ci > 0 and c.-i < oo, such 
that ci < {\ x i\ 2 ) XT> < C 2 for all i G V U V. 

Proof: Observe that the data and pilot modulation sym¬ 
bol alphabets Ad and Ap only contain finite, non-zero val¬ 
ues. We can thus take Ci = min xe A P uA D |aj 2 and C 2 = 

max X £ApUA D \x\ 2 to complete the proof. ■ 

Lemma 3: (Assumes Conjecture |T] holds.) The largest and 
smallest eigenvalues of C obey 

Amax(C) = 0(N), (35) 

Amin(C) > q 1 . (36) 


Proof: By the Weyl inequality for Hermitian matrices C, 
T and (X H X) _1 ; 

A max (C) < A max (<X H X)^) +A max (T). 

Now ( |35| > follows directly from Conjecture [T] and Lemma [2] 
Similarly by the dual Weyl inequality: 

Amin(C) > A min (<X H X)^) + A min (T). 


Since L < N, the matrix T is singular and A m j n (T) = 0. The 
inequality ( [36] ) now follows from Lemma [2] ■ 

By Theorem 2.2 in [55), the number of iterations required by 
the CG method to achieve a desired accuracy in the solution 


of a = Cz is O 


( / Ama: 

V Ami, 


(C) 


(C) 


. By Lemma 


the number of 


iterations is thus 0(y/N). Each iteration has time complexity 
O(LN) and the overall complexity of solving ( |3 1 [ i via this 
method is therefore 0(LNy/N). 
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